Analysis of canonical and non-canonical splice sites in mammalian genomes.
نویسندگان
چکیده
A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.
منابع مشابه
A comprehensive survey of non-canonical splice sites in the human transcriptome
We uncovered the diversity of non-canonical splice sites at the human transcriptome using deep transcriptome profiling. We mapped a total of 3.7 billion human RNA-seq reads and developed a set of stringent filters to avoid false non-canonical splice site detections. We identified 184 splice sites with non-canonical dinucleotides and U2/U12-like consensus sequences. We selected 10 of the herein ...
متن کاملEvaluation of Water Deficient Stress Tolerance in some Wheat Cultivars and Their Hybrids using Canonical Discriminant Analysis and Genotype by Trait Biplot
Canonical discriminant analysis (CDA) in combination with cluster analysis and genotype by trait (GT) biplot analysis methods were used to assess 9 wheat cultivars having different degrees of tolerance along with 36 F1 hybrids derived from partial diallel crosses, using stress tolerance indices, in two levels (well watered and cessation of irrigation at pollination stage). Cluster analysis clas...
متن کاملBenthic Macroinvertabrate distribution in Tajan River Using Canonical Correspondence Analysis
The distribution of macroinvertebrate communities from 5 sampling sites of the Tajan River were used to examine the relationship among physiochemical parameters with macroinvertebrate communities and also to assess ecological classification system as a tool for the management and conservation purposes. The amount of variation explained in macroinvertebrate taxa composition is within values r...
متن کاملHMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data
BACKGROUND High-throughput sequencing of an organism's transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. ME...
متن کاملCanonical Analysis of the Relationship between Components of Professional Ethics and Dimensions of Social Responsibility
Background: Today, professional ethics and social responsibility play an important role in organizations. This study aimed canonical analysis of the relationship between components of professional ethics and social responsibility dimensions among the first high school teachers in the Naghadeh province. Method: This study, in terms of purpose is application, and in terms of data collec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Nucleic acids research
دوره 28 21 شماره
صفحات -
تاریخ انتشار 2000